<!DOCTYPE html>
            
<HTML>
<HEAD>
<meta name="booktitle" content="Developing Applications With Objective Caml" >
 <meta charset="ISO-8859-1"><meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=0">
<META name="GENERATOR" content="hevea 1.05-7 of 2000-02-24">
<META NAME="Author" CONTENT="Christian.Queinnec@lip6.fr">
<LINK rel=stylesheet type="text/css" href="videoc-ocda.css">
<script language="JavaScript" src="videoc.js"><!--
//--></script>
<TITLE>
 Exploring Objective CAML values from C
</TITLE>
</HEAD>
<BODY class="regularBody">
<A HREF="book-ora114.html"><IMG SRC ="previous_motif.gif" ALT="Previous"></A>
<A HREF="index.html"><IMG SRC ="contents_motif.gif" ALT="Contents"></A>
<A HREF="book-ora116.html"><IMG SRC ="next_motif.gif" ALT="Next"></A>
<HR>

<H2> Exploring Objective CAML values from C</H2>
<A NAME="sec-explor-IC"></A>
<A NAME="@concepts260"></A>
<A NAME="@concepts261"></A>
The machine representation of Objective CAML values differs from that of C
values, even for fundamental types such as integers. This is because the
Objective CAML garbage collector needs to record additional information in
values. Since Objective CAML values are represented uniformly, their
representations all belong to the same C type, named (unsurprisingly) <TT>value</TT>.<BR>
<BR>
When Objective CAML calls a C function, passing it one or several arguments,
those arguments must be decoded before using them in the C function.
Similarly, the result of this C function must be encoded before being
returned to Objective CAML.<BR>
<BR>
<A NAME="@fonctions364"></A>
<A NAME="@fonctions365"></A>
<A NAME="@fonctions366"></A>
<A NAME="@concepts262"></A>
These conversions (decoding and encoding) are performed by a
number of macros and C functions provided by the Objective CAML runtime
system. These macros and functions are declared in the include files
listed in figure <A HREF="book-ora115.html#fig-fich-C">12.3</A>. These include files are part of
the Objective CAML installation, and can be found in the directory
where Objective CAML libraries are installed<A NAME="text31" HREF="book-ora122.html#note31"><SUP><FONT SIZE=2>4</FONT></SUP></A><BR>
<BR>
<BLOCKQUOTE><DIV ALIGN=center><HR WIDTH="80%" SIZE=2></DIV>
<DIV ALIGN=center>
<TABLE BORDER=1 CELLSPACING=0 CELLPADDING=1>
<TR><TD  ALIGN=left NOWRAP>&nbsp;</TD>
<TD  ALIGN=left NOWRAP>&nbsp;</TD>
</TR>
<TR><TD  ALIGN=left NOWRAP><TT>caml/mlvalues.h</TT></TD>
<TD  ALIGN=left NOWRAP>definition of the <TT>value</TT> type and basic value conversion macros.</TD>
</TR>
<TR><TD  ALIGN=left NOWRAP><TT>caml/alloc.h</TT></TD>
<TD  ALIGN=left NOWRAP>functions for allocating Objective CAML values.</TD>
</TR>
<TR><TD  ALIGN=left NOWRAP><TT>caml/memory.h</TT></TD>
<TD  ALIGN=left NOWRAP>macros for interfacing with the Objective CAML garbage collector.</TD>
</TR></TABLE>
</DIV>
<BR>
<DIV ALIGN=center>Figure 12.3: Include files for the C interface.</DIV><BR>

<A NAME="fig-fich-C"></A>
<DIV ALIGN=center><HR WIDTH="80%" SIZE=2></DIV></BLOCKQUOTE><A NAME="toc151"></A>
<H3> Classification of Objective CAML representations</H3>
<A NAME="@fonctions367"></A>
An Objective CAML representation, that is, a C datum of type <TT>value</TT>, is
one of:
<UL>
<LI>
 an immediate value (represented as an integer);

<LI> a pointer into the Objective CAML heap;

<LI> a pointer pointing outside the Objective CAML heap.
</UL>
The Objective CAML heap is the memory area that is managed by the Objective CAML
garbage collector. C code can also allocate and manipulate data
structures in its own memory space, and communicate pointers to these data
structures to Objective CAML.<BR>
<BR>
Figure <A HREF="book-ora115.html#fig-macros-intro-C">12.4</A> shows the macros for classifying
representations and converting between C integers and their Objective CAML
representation.
<BLOCKQUOTE><DIV ALIGN=center><HR WIDTH="80%" SIZE=2></DIV>
<DIV ALIGN=center>
<TABLE BORDER=1 CELLSPACING=0 CELLPADDING=1>
<TR><TD  ALIGN=left NOWRAP>&nbsp;</TD>
<TD  ALIGN=left NOWRAP>&nbsp;</TD>
</TR>
<TR><TD  ALIGN=left NOWRAP><TT>Is_long(v)</TT></TD>
<TD  ALIGN=left NOWRAP>is <TT>v</TT> an Objective CAML integer?</TD>
</TR>
<TR><TD  ALIGN=left NOWRAP><TT>Is_block(v)</TT></TD>
<TD  ALIGN=left NOWRAP>is <TT>v</TT> an Objective CAML pointer?</TD>
</TR>
<TR><TD  ALIGN=left NOWRAP>&nbsp;</TD>
<TD  ALIGN=left NOWRAP>&nbsp;</TD>
</TR>
<TR><TD  ALIGN=left NOWRAP><TT>Long_val(v)</TT></TD>
<TD  ALIGN=left NOWRAP>extract the integer contained in <TT>v</TT>, as
a C "long"</TD>
</TR>
<TR><TD  ALIGN=left NOWRAP><TT>Int_val(v)</TT></TD>
<TD  ALIGN=left NOWRAP>extract the integer contained in <TT>v</TT>, as
a C "int"</TD>
</TR>
<TR><TD  ALIGN=left NOWRAP><TT>Bool_val(v)</TT></TD>
<TD  ALIGN=left NOWRAP>extract the boolean contained in <TT>v</TT> (0
if <TT>false</TT>, non-zero if <TT>true</TT>)</TD>
</TR></TABLE>
</DIV>
<BR>
<DIV ALIGN=center>Figure 12.4: Classification of representations and conversion of immediate
values.</DIV><BR>

<A NAME="fig-macros-intro-C"></A>
<DIV ALIGN=center><HR WIDTH="80%" SIZE=2></DIV></BLOCKQUOTE>
Note that C offers several integer types of varying sizes
(<TT>short</TT>, <TT>int</TT>, <TT>long</TT>, etc), while Objective CAML 
has only one integer type, <TT>int</TT>.<BR>
<BR>
<A NAME="toc152"></A>
<H3> Accessing immediate values</H3>
<A NAME="sec-acces-imm"></A>
<A NAME="@concepts263"></A>
All Objective CAML immediate values are represented as integers:
<UL>
<LI>
 integers are represented by their value;

<LI> characters are represented by their ASCII code<A NAME="text32" HREF="book-ora122.html#note32"><SUP><FONT SIZE=2>5</FONT></SUP></A>;

<LI> constant constructors are represented by an integer
corresponding to their position in the datatype declaration:
the n<SUP><FONT SIZE=2>th</FONT></SUP> constant constructor of a datatype is
represented by the integer <I>n</I>-1.
</UL>The following program defines a C function <TT>inspect</TT> that
inspects the representation of its argument:<BR>

<PRE>
<CODE>#include &lt;stdio.h&gt;</CODE><BR><CODE>#include &lt;caml/mlvalues.h&gt;</CODE><BR><CODE>value inspect (value v)</CODE><BR><CODE>{</CODE><BR><CODE>  if (Is_long(v))</CODE><BR><CODE>    printf ("v is an integer (%ld) : %ld", (long) v, Long_val(v));</CODE><BR><CODE>  else if (Is_block(v))</CODE><BR><CODE>    printf ("v is a pointer");</CODE><BR><CODE>  else</CODE><BR><CODE>    printf ("v is neither an integer nor a pointer (???)");</CODE><BR><CODE>  printf("   "); </CODE><BR><CODE>  fflush(stdout) ;</CODE><BR><CODE>  return v ;</CODE><BR><CODE>}</CODE><BR>

</PRE>

The function <TT>inspect</TT> tests whether its argument is an Objective CAML
integer. If so, it prints the integer twice, first viewed as a C long
integer (without conversion), then converted by the <TT>Long_val</TT>
macro, which extracts the actual integer represented in the argument.<BR>
<BR>
On the following example, we see that the machine representation of
integers in Objective CAML differs from that of C:


<PRE><BR># <B>external</B><CODE> </CODE>inspect<CODE> </CODE><CODE>:</CODE><CODE> </CODE>'a<CODE> </CODE>-&gt;<CODE> </CODE>'a<CODE> </CODE><CODE>=</CODE><CODE> </CODE><CODE>"inspect"</CODE><CODE> </CODE>;;<BR><CODE>external inspect : 'a -&gt; 'a = "inspect"</CODE><BR># inspect<CODE> </CODE><CODE>1</CODE><CODE>2</CODE><CODE>3</CODE><CODE> </CODE>;;<BR><CODE>v is an integer (247) : 123   - : int = 123</CODE><BR># inspect<CODE> </CODE>max_int;;<BR><CODE>v is an integer (2147483647) : 1073741823   - : int = 1073741823</CODE><BR>

</PRE>

We can also inspect values of other predefined types, such as
<I>char</I> and <I>bool</I>:


<PRE><BR># inspect<CODE> </CODE><CODE>'A'</CODE><CODE> </CODE>;;<BR><CODE>v is an integer (131) : 65   - : char = 'A'</CODE><BR># inspect<CODE> </CODE><B>true</B><CODE> </CODE>;;<BR><CODE>v is an integer (3) : 1   - : bool = true</CODE><BR># inspect<CODE> </CODE><B>false</B><CODE> </CODE>;;<BR><CODE>v is an integer (1) : 0   - : bool = false</CODE><BR># inspect<CODE> </CODE>[]<CODE> </CODE>;;<BR><CODE>v is an integer (1) : 0   - : '_a list = []</CODE><BR>

</PRE>
<BR>
<BR>
Consider the Objective CAML type <TT>foo</TT> defined thus:


<PRE><BR># <B>type</B><CODE> </CODE>foo<CODE> </CODE><CODE>=</CODE><CODE> </CODE>C1<CODE> </CODE><CODE>|</CODE><CODE> </CODE>C2<CODE> </CODE><B>of</B><CODE> </CODE>int<CODE> </CODE><CODE>|</CODE><CODE> </CODE>C3<CODE> </CODE><CODE>|</CODE><CODE> </CODE>C4<CODE> </CODE>;;<BR>

</PRE>
<BR>
<BR>
The <TT>inspect</TT> function shows that constant constructors and
non-constant constructors of this type are represented differently:


<PRE><BR># inspect<CODE> </CODE>C1<CODE> </CODE>;;<BR><CODE>v is an integer (1) : 0   - : foo = C1</CODE><BR># inspect<CODE> </CODE>C4<CODE> </CODE>;;<BR><CODE>v is an integer (5) : 2   - : foo = C4</CODE><BR># inspect<CODE> </CODE><TT>(</TT>C2<CODE> </CODE><CODE>1</CODE><TT>)</TT><CODE> </CODE>;;<BR><CODE>v is a pointer   - : foo = C2 1</CODE><BR>

</PRE>
<BR>
<BR>
When the function <TT>inspect</TT> detects an immediate value, it prints
first the ``physical'' representation of this value
(i.e. the representation viewed as a word-sized C integer of C
type <TT>long</TT>); then it prints the ``logical'' contents of this
value (i.e. the Objective CAML integer it represents, as returned by
the decoding macro <TT>Long_val</TT>). The examples above show that the
``physical'' and the ``logical'' contents differ. This difference is
due to the tag bit<A NAME="text33" HREF="book-ora122.html#note33"><SUP><FONT SIZE=2>6</FONT></SUP></A> used by the garbage collector to distinguish immediate values
from pointers (see chapter&nbsp;<A HREF="index.html#chap-GC">9</A>, page&nbsp;<A HREF="book-ora086.html#GC-bdt">??</A>).<BR>
<BR>
<A NAME="toc153"></A>
<H3> Representation of structured values</H3>
<A NAME="@concepts264"></A>
Non-immediate Objective CAML values are said to be structured values.
Those values are allocated in the Objective CAML heap and represented as a
pointer to the corresponding memory block. All memory blocks contain
a header word indicating the kind of the block as well as its size
expressed in machine words. Figure <A HREF="book-ora115.html#fig-com3-C">12.5</A> shows the
structure of a block for a 32-bit machine.
<BLOCKQUOTE><DIV ALIGN=center><HR WIDTH="80%" SIZE=2></DIV>
<DIV ALIGN=center>
<IMG SRC="book-ora044.gif">
</DIV>
<BR>
<DIV ALIGN=center>Figure 12.5: Structure of an Objective CAML heap block.</DIV><BR>

<A NAME="fig-com3-C"></A>
<DIV ALIGN=center><HR WIDTH="80%" SIZE=2></DIV></BLOCKQUOTE>
The two ``color'' bits are used by the garbage collector for walking
the memory graph (see chapter&nbsp;<A HREF="index.html#chap-GC">9</A>,
page&nbsp;<A HREF="book-ora086.html#GC-mark-sweep">??</A>). The ``tag'' field, or ``tag'' for short,
contains the kind of the block. The ``size'' field contains the size
of the block, in words, excluding the header.
The macros listed in figure <A HREF="book-ora115.html#fig-bloc-C">12.6</A> return the tag and size
of a block.
<BLOCKQUOTE><DIV ALIGN=center><HR WIDTH="80%" SIZE=2></DIV>
<DIV ALIGN=center>
<TABLE BORDER=1 CELLSPACING=0 CELLPADDING=1>
<TR><TD  ALIGN=left NOWRAP>&nbsp;</TD>
<TD  ALIGN=left NOWRAP>&nbsp;</TD>
</TR>
<TR><TD  ALIGN=left NOWRAP><TT>Wosize_val(v)</TT></TD>
<TD  ALIGN=left NOWRAP>return the size of the block <TT>v</TT> (header excluded)</TD>
</TR>
<TR><TD  ALIGN=left NOWRAP><TT>Tag_val(v)</TT></TD>
<TD  ALIGN=left NOWRAP>return the tag of the block <TT>v</TT></TD>
</TR></TABLE>
</DIV>
<BR>
<DIV ALIGN=center>Figure 12.6: Accessing header information in memory blocks.</DIV><BR>

<A NAME="fig-bloc-C"></A>
<DIV ALIGN=center><HR WIDTH="80%" SIZE=2></DIV></BLOCKQUOTE>
The tag of a memory block can take the values listed in figure
<A HREF="book-ora115.html#fig-tag-C">12.7</A>. 
<BLOCKQUOTE><DIV ALIGN=center><HR WIDTH="80%" SIZE=2></DIV>
<DIV ALIGN=center>
<TABLE BORDER=1 CELLSPACING=0 CELLPADDING=1>
<TR><TD  ALIGN=left NOWRAP>from <TT>0</TT> to <TT>No_scan_tag-1</TT></TD>
<TD  ALIGN=left NOWRAP>an array of Objective CAML value representations</TD>
</TR>
<TR><TD  ALIGN=left NOWRAP><TT>Closure_tag</TT></TD>
<TD  ALIGN=left NOWRAP>a function closure</TD>
</TR>
<TR><TD  ALIGN=left NOWRAP><TT>String_tag</TT></TD>
<TD  ALIGN=left NOWRAP>a character string</TD>
</TR>
<TR><TD  ALIGN=left NOWRAP><TT>Double_tag</TT></TD>
<TD  ALIGN=left NOWRAP>a double-precision float</TD>
</TR>
<TR><TD  ALIGN=left NOWRAP><TT>Double_array_tag</TT></TD>
<TD  ALIGN=left NOWRAP>an array of float</TD>
</TR>
<TR><TD  ALIGN=left NOWRAP><TT>Abstract_tag</TT></TD>
<TD  ALIGN=left NOWRAP>an abstract data type</TD>
</TR>
<TR><TD  ALIGN=left NOWRAP><TT>Final_tag</TT></TD>
<TD  ALIGN=left NOWRAP>an abstract data type equipped with a finalization function</TD>
</TR></TABLE>
</DIV>
<BR>
<DIV ALIGN=center>Figure 12.7: Tags of memory blocks.</DIV><BR>

<A NAME="fig-tag-C"></A>
<DIV ALIGN=center><HR WIDTH="80%" SIZE=2></DIV></BLOCKQUOTE>
Depending on the block tag, different macros are used to access the
contents of the blocks. These macros are described in figure
<A HREF="book-ora115.html#fig-acces-bloc-C">12.8</A>. When the tag is less than
<TT>No_scan_tag</TT>, the heap block is structured as an array of
Objective CAML value representations. Each element of the array is called a
``field'' of the memory block. In accordance with C and Objective CAML
conventions, the first field is at index 0, and the last field is at
index <TT>Wosize_val(v) - 1</TT>. 
<BLOCKQUOTE><DIV ALIGN=center><HR WIDTH="80%" SIZE=2></DIV>
<DIV ALIGN=center>
<TABLE BORDER=1 CELLSPACING=0 CELLPADDING=1>
<TR><TD  ALIGN=left NOWRAP>&nbsp;</TD>
<TD  ALIGN=left NOWRAP>&nbsp;</TD>
</TR>
<TR><TD  ALIGN=left NOWRAP><TT>Field(v,n)</TT></TD>
<TD  ALIGN=left NOWRAP>return the <TT>n</TT><SUP><FONT SIZE=2>th</FONT></SUP> field of
<TT>v</TT>.</TD>
</TR>
<TR><TD  ALIGN=left NOWRAP><TT>Code_val(v)</TT></TD>
<TD  ALIGN=left NOWRAP>return the code pointer for a closure.</TD>
</TR>
<TR><TD  ALIGN=left NOWRAP><TT>string_length(v)</TT></TD>
<TD  ALIGN=left NOWRAP>return the length of a string.</TD>
</TR>
<TR><TD  ALIGN=left NOWRAP><TT>Byte(v,n)</TT></TD>
<TD  ALIGN=left NOWRAP>return the <TT>n</TT><SUP><FONT SIZE=2> th</FONT></SUP> character
of a string, with C type <TT>char</TT>.</TD>
</TR>
<TR><TD  ALIGN=left NOWRAP><TT>Byte_u(v,n)</TT></TD>
<TD  ALIGN=left NOWRAP>same, but result has C type <TT>unsigned char</TT>.</TD>
</TR>
<TR><TD  ALIGN=left NOWRAP><TT>String_val(v)</TT></TD>
<TD  ALIGN=left NOWRAP>return the contents of a string with C type
 <TT>(char *)</TT>.</TD>
</TR>
<TR><TD  ALIGN=left NOWRAP><TT>Double_val(v)</TT></TD>
<TD  ALIGN=left NOWRAP>return the float contained in <TT>v</TT>.</TD>
</TR>
<TR><TD  ALIGN=left NOWRAP><TT>Double_field(v,n)</TT></TD>
<TD  ALIGN=left NOWRAP>return the <TT>n</TT><SUP><FONT SIZE=2> th</FONT></SUP>
float contained in the float array <TT>v</TT>.</TD>
</TR></TABLE>
</DIV>
<BR>
<DIV ALIGN=center>Figure 12.8: Accessing the content of a memory block.</DIV><BR>

<A NAME="fig-acces-bloc-C"></A>
<DIV ALIGN=center><HR WIDTH="80%" SIZE=2></DIV></BLOCKQUOTE>As we did earlier for immediate values, we now define a function to
inspect memory blocks. The C function <TT>print_block</TT> takes an
Objective CAML value representation, tests whether it is an immediate value
or a memory block, and in the latter case prints the kind and contents
of the block. It is called from the wrapper function
<TT>inspect_block</TT>, which can be called from Objective CAML.<BR>
<BR>


<PRE>
<CODE>#include &lt;stdio.h&gt;</CODE><BR><CODE>#include &lt;caml/mlvalues.h&gt;</CODE><BR><BR><CODE>void margin (int n)</CODE><BR><CODE>  { while (n-- &gt; 0) printf(".");  return; }</CODE><BR><BR><CODE>void print_block (value v,int m) </CODE><BR><CODE>{</CODE><BR><CODE>  int size, i;</CODE><BR><CODE>  margin(m);</CODE><BR><CODE>  if (Is_long(v)) </CODE><BR><CODE>    { printf("immediate value (%d)\n", Long_val(v));  return; };</CODE><BR><CODE>  printf ("memory block: size=%d  -  ", size=Wosize_val(v));</CODE><BR><CODE>  switch (Tag_val(v))</CODE><BR><CODE>   {</CODE><BR><CODE>    case Closure_tag : </CODE><BR><CODE>        printf("closure with %d free variables\n", size-1);</CODE><BR><CODE>        margin(m+4); printf("code pointer: %p\n",Code_val(v)) ;</CODE><BR><CODE>        for (i=1;i&lt;size;i++)  print_block(Field(v,i), m+4);</CODE><BR><CODE>        break;</CODE><BR><CODE>    case String_tag :</CODE><BR><CODE>        printf("string: %s (%s)\n", String_val(v),(char *) v);  </CODE><BR><CODE>        break;</CODE><BR><CODE>    case Double_tag:  </CODE><BR><CODE>        printf("float: %g\n", Double_val(v));</CODE><BR><CODE>        break;</CODE><BR><CODE>    case Double_array_tag : </CODE><BR><CODE>        printf ("float array: "); </CODE><BR><CODE>        for (i=0;i&lt;size/Double_wosize;i++)  printf("  %g", Double_field(v,i));</CODE><BR><CODE>        printf("\n");</CODE><BR><CODE>        break;</CODE><BR><CODE>    case Abstract_tag : printf("abstract type\n"); break;</CODE><BR><CODE>    case Final_tag : printf("abstract finalized type\n"); break;</CODE><BR><CODE>    default:  </CODE><BR><CODE>        if (Tag_val(v)&gt;=No_scan_tag) { printf("unknown tag"); break; }; </CODE><BR><CODE>        printf("structured block (tag=%d):\n",Tag_val(v));</CODE><BR><CODE>        for (i=0;i&lt;size;i++)  print_block(Field(v,i),m+4);</CODE><BR><CODE>   }</CODE><BR><CODE>  return ;</CODE><BR><CODE>}</CODE><BR><BR><CODE>value inspect_block (value v)  </CODE><BR><CODE>  { print_block(v,4); fflush(stdout); return v; }</CODE><BR><BR>

</PRE>
 
Each possible tag for a block corresponds to a case of the
<TT>switch</TT> construct. In the case of a block containing an array
of Objective CAML values, we recursively call <TT>print_block</TT> on each
field of the array. We then redefine the <TT>inspect</TT> function:


<PRE><BR># <B>external</B><CODE> </CODE>inspect<CODE> </CODE><CODE>:</CODE><CODE> </CODE>'a<CODE> </CODE>-&gt;<CODE> </CODE>'a<CODE> </CODE><CODE>=</CODE><CODE> </CODE><CODE>"inspect_block"</CODE><CODE> </CODE>;;<BR><CODE>external inspect : 'a -&gt; 'a = "inspect_block"</CODE><BR>

</PRE>

We can now explore the representations of Objective CAML structured values.
We must be careful not to apply <TT>inspect_block</TT> to a cyclic value,
since the recursive traversal of the value would then loop indefinitely.<BR>
<BR>

<H4> Arrays, tuples, and records</H4>
Arrays and tuples are represented by structured blocks. The 
<I>n</I><SUP><FONT SIZE=2>th</FONT></SUP> field of the block contains the
representation of the <I>n</I><SUP><FONT SIZE=2>th</FONT></SUP> element of the
array or tuple.


<PRE><BR># inspect<CODE> </CODE><CODE>[|</CODE><CODE> </CODE><CODE>1</CODE>;<CODE> </CODE><CODE>2</CODE>;<CODE> </CODE><CODE>3</CODE><CODE> </CODE><CODE>|]</CODE><CODE> </CODE>;;<BR><CODE>....memory block: size=3  -  structured block (tag=0):</CODE><BR><CODE>........immediate value (1)</CODE><BR><CODE>........immediate value (2)</CODE><BR><CODE>........immediate value (3)</CODE><BR><CODE>- : int array = [|1; 2; 3|]</CODE><BR># inspect<CODE> </CODE><TT>(</TT><CODE> </CODE><CODE>1</CODE><CODE>0</CODE><CODE> </CODE><CODE>,</CODE><CODE> </CODE><B>true</B><CODE> </CODE><CODE>,</CODE><CODE> </CODE>()<CODE> </CODE><TT>)</TT><CODE> </CODE>;;<BR><CODE>....memory block: size=3  -  structured block (tag=0):</CODE><BR><CODE>........immediate value (10)</CODE><BR><CODE>........immediate value (1)</CODE><BR><CODE>........immediate value (0)</CODE><BR><CODE>- : int * bool * unit = 10, true, ()</CODE><BR>

</PRE>
<BR>
<BR>
Records are also represented as structured blocks. The values of the
record fields appear in the order given at record declaration time.
Mutable fields and immutable fields are represented identically.


<PRE><BR># <B>type</B><CODE> </CODE>foo<CODE> </CODE><CODE>=</CODE><CODE> </CODE>{<CODE> </CODE>fld1<CODE>:</CODE><CODE> </CODE>int<CODE> </CODE>;<CODE> </CODE><B>mutable</B><CODE> </CODE>fld2<CODE>:</CODE><CODE> </CODE>int<CODE> </CODE>}<CODE> </CODE>;;<BR><CODE>type foo = { fld1: int; mutable fld2: int }</CODE><BR># inspect<CODE> </CODE>{<CODE> </CODE>fld1<CODE>=</CODE><CODE>1</CODE><CODE>0</CODE><CODE> </CODE>;<CODE> </CODE>fld2<CODE>=</CODE><CODE>2</CODE><CODE>0</CODE><CODE> </CODE>}<CODE> </CODE>;;<BR><CODE>....memory block: size=2  -  structured block (tag=0):</CODE><BR><CODE>........immediate value (10)</CODE><BR><CODE>........immediate value (20)</CODE><BR><CODE>- : foo = {fld1=10; fld2=20}</CODE><BR>

</PRE>
<BR>
<BR>


<H3> Warning </H3> <HR>

Nothing prevents a C function from physically modifying an immutable
record field. It is the programmers' responsibility to make sure that
their C functions do not introduce inconsistencies in Objective CAML data
structures.


<HR>

<BR>
<BR>

<H4> Sum types</H4>
<A NAME="IC-types-sommes"></A>
<A NAME="@concepts265"></A>
We previously saw that constant constructors are represented like
integers. A non-constant constructor is represented by a block
containing the constructor's arguments, with a tag identifying the
constructor. The tag associated with a non-constant constructor
represents its position in the type declaration: the first
non-constant constructor has tag 0, the second one has tag 1, and so on.


<PRE><BR># <B>type</B><CODE> </CODE>foo<CODE> </CODE><CODE>=</CODE><CODE> </CODE>C1<CODE> </CODE><B>of</B><CODE> </CODE>int<CODE> </CODE><CODE>*</CODE><CODE> </CODE>int<CODE> </CODE><CODE>*</CODE><CODE> </CODE>int<CODE> </CODE><CODE>|</CODE><CODE> </CODE>C2<CODE> </CODE><B>of</B><CODE> </CODE>int<CODE> </CODE><CODE>|</CODE><CODE> </CODE>C3<CODE> </CODE><CODE>|</CODE><CODE> </CODE>C4<CODE> </CODE><B>of</B><CODE> </CODE>int<CODE> </CODE><CODE>*</CODE><CODE> </CODE>int<CODE> </CODE>;;<BR><CODE>type foo = | C1 of int * int * int | C2 of int | C3 | C4 of int * int</CODE><BR># inspect<CODE> </CODE><TT>(</TT>C1<CODE> </CODE><TT>(</TT><CODE>1</CODE><CODE>,</CODE><CODE>2</CODE><CODE>,</CODE><CODE>3</CODE><TT>)</TT><TT>)</TT><CODE> </CODE>;;<BR><CODE>....memory block: size=3  -  structured block (tag=0):</CODE><BR><CODE>........immediate value (1)</CODE><BR><CODE>........immediate value (2)</CODE><BR><CODE>........immediate value (3)</CODE><BR><CODE>- : foo = C1 (1, 2, 3)</CODE><BR># inspect<CODE> </CODE><TT>(</TT>C4<CODE> </CODE><TT>(</TT><CODE>1</CODE><CODE>,</CODE><CODE>2</CODE><TT>)</TT><TT>)</TT><CODE> </CODE>;;<BR><CODE>....memory block: size=2  -  structured block (tag=2):</CODE><BR><CODE>........immediate value (1)</CODE><BR><CODE>........immediate value (2)</CODE><BR><CODE>- : foo = C4 (1, 2)</CODE><BR>

</PRE>
<BR>
<BR>


<H3> Note </H3> <HR>

The type <I>list</I> is a sum type whose declaration is:<BR><B>type</B><CODE> </CODE>'a<CODE> </CODE>list<CODE> </CODE><CODE>=</CODE><CODE> </CODE>[]<CODE> </CODE><CODE>|</CODE><CODE> </CODE>::<CODE> </CODE><B>of</B><CODE> </CODE>'a<CODE> </CODE><CODE>*</CODE><CODE> </CODE>'a<CODE> </CODE>list. 
This type has only one non-constant constructor (::).
Thus, a non-empty list is represented by a memory block with tag 0.


<HR>

<BR>
<BR>

<H4> Character strings</H4>
<A NAME="@concepts266"></A>Characters inside strings occupy one byte each. Thus, the memory
block representing a string uses one word per group of four characters
(on a 32-bit machine) or eight characters (on a 64-bit machine).<BR>
<BR>


<H3> Warning </H3> <HR>

Objective CAML strings can contain the null character whose ASCII 
code is 0. In C, the null character represents the end of a
string, and cannot appear inside a string.


<HR>

<BR>
<BR>


<PRE>
<CODE>#include &lt;stdio.h&gt;</CODE><BR><CODE>#include &lt;caml/mlvalues.h&gt;</CODE><BR><BR><CODE>value explore_string (value v)</CODE><BR><CODE>{</CODE><BR><CODE>  char *s;</CODE><BR><CODE>  int i,size;</CODE><BR><CODE>  s = (char *) v;</CODE><BR><CODE>  size = Wosize_val(v) * sizeof(value);</CODE><BR><CODE>  for (i=0;i&lt;size;i++) </CODE><BR><CODE>    {</CODE><BR><CODE>      int p = (unsigned int) s[i] ;</CODE><BR><CODE>      if ((p&gt;31) &amp;&amp; (p&lt;128)) printf("%c",s[i]); else printf("(#%u)",p);</CODE><BR><CODE>    }</CODE><BR><CODE>  printf("\n");</CODE><BR><CODE>  fflush(stdout);</CODE><BR><CODE>  return v;</CODE><BR><CODE>}</CODE><BR><BR>

</PRE>

The length and position of last character of an Objective CAML string
are determined not by looking for a terminating null character, as
in C, but by combining the size of the memory block that contains the
string with the last byte of the last word of this block, which
indicates the number of <EM>unused</EM> bytes in the last word. The
following examples clarify the role played by this last byte.


<PRE><BR># <B>external</B><CODE> </CODE>explore<CODE> </CODE><CODE>:</CODE><CODE> </CODE>string<CODE> </CODE>-&gt;<CODE> </CODE>string<CODE> </CODE><CODE>=</CODE><CODE> </CODE><CODE>"explore_string"</CODE><CODE> </CODE>;;<BR><CODE>external explore : string -&gt; string = "explore_string"</CODE><BR># ignore<TT>(</TT>explore<CODE> </CODE><CODE>""</CODE><TT>)</TT>;<CODE> </CODE><BR><CODE> </CODE>ignore<TT>(</TT>explore<CODE> </CODE><CODE>"a"</CODE><TT>)</TT>;<BR><CODE> </CODE>ignore<TT>(</TT>explore<CODE> </CODE><CODE>"ab"</CODE><TT>)</TT>;<BR><CODE> </CODE>ignore<TT>(</TT>explore<CODE> </CODE><CODE>"abc"</CODE><TT>)</TT>;<BR><CODE> </CODE>ignore<TT>(</TT>explore<CODE> </CODE><CODE>"abcd"</CODE><TT>)</TT>;<BR><CODE> </CODE>ignore<TT>(</TT>explore<CODE> </CODE><CODE>"abcd\000"</CODE><TT>)</TT><CODE> </CODE>;;<BR><CODE>(#0)(#0)(#0)(#3)</CODE><BR><CODE>a(#0)(#0)(#2)</CODE><BR><CODE>ab(#0)(#1)</CODE><BR><CODE>abc(#0)</CODE><BR><CODE>abcd(#0)(#0)(#0)(#3)</CODE><BR><CODE>abcd(#0)(#0)(#0)(#2)</CODE><BR><CODE>- : unit = ()</CODE><BR>

</PRE>

In the last two examples (<TT>"abcd"</TT> and
<TT>"abcd</TT><TT>\</TT><TT>000"</TT>), the strings are of length
4 and 5 respectively.
This explains why the last byte takes two different values,
although the other bytes of the string representations are identical.<BR>
<BR>

<H4> Floats and float arrays</H4>
<A NAME="@concepts267"></A>
<A NAME="@concepts268"></A>
<A NAME="sec-tab-float"></A>
Objective CAML offers only one type (<I>float</I>) of floating-point
numbers. This type corresponds to 64-bit, double-precision floating
point numbers in C (type <TT>double</TT>). Values of type <I>float</I>
are heap-allocated and represented by a memory block of size 2 words
(on a 32-bit machine) or 1 word (on a 64-bit machine).


<PRE><BR># inspect<CODE> </CODE><CODE>1</CODE><CODE>.</CODE><CODE>5</CODE><CODE> </CODE>;;<BR><CODE>....memory block: size=2  -  float: 1.5</CODE><BR><CODE>- : float = 1.5</CODE><BR># inspect<CODE> </CODE><CODE>0</CODE><CODE>.</CODE><CODE>0</CODE>;;<BR><CODE>....memory block: size=2  -  float: 0</CODE><BR><CODE>- : float = 0</CODE><BR>

</PRE>
<BR>
<BR>
Arrays of floats are represented specially to reduce their memory
occupancy: the floats contained in the array are stored consecutively
in the memory block, rather than having each float heap-allocated separately.
Therefore, float arrays possess a specific tag and specific access macros.


<PRE><BR># inspect<CODE> </CODE><CODE>[|</CODE><CODE> </CODE><CODE>1</CODE><CODE>.</CODE><CODE>5</CODE><CODE> </CODE>;<CODE> </CODE><CODE>2</CODE><CODE>.</CODE><CODE>5</CODE><CODE> </CODE>;<CODE> </CODE><CODE>3</CODE><CODE>.</CODE><CODE>5</CODE><CODE> </CODE><CODE>|]</CODE><CODE> </CODE>;;<BR><CODE>....memory block: size=6  -  float array:   1.5  2.5  3.5</CODE><BR><CODE>- : float array = [|1.5; 2.5; 3.5|]</CODE><BR>

</PRE>

This optimized representation encourages the use of Objective CAML for numerical
computations that manipulate many float arrays: operations on array
elements are much more efficient than if each float was heap-allocated
separately.


<H3> Warning </H3> <HR>

When allocating an Objective CAML float array from C, the size of the block
should be the number of array elements multiplied by
<TT>Double_wosize</TT>. The <TT>Double_wosize</TT> macro represents the
number of words occupied by a double-precision float
(2 words on a 32-bit machine, but only 1 word on a 64-bit machine).


<HR>

<BR>
<BR>
With the exception of float arrays, floating-point numbers contained
in other data structures are always treated as a structured,
heap-allocated value. The following example shows the representation
of a list of floats.


<PRE><BR># inspect<CODE> </CODE><CODE>[</CODE><CODE> </CODE><CODE>3</CODE><CODE>.</CODE><CODE>1</CODE><CODE>4</CODE>;<CODE> </CODE><CODE>1</CODE><CODE>.</CODE><CODE>2</CODE>;<CODE> </CODE><CODE>7</CODE><CODE>.</CODE><CODE>6</CODE><CODE>]</CODE>;;<BR><CODE>....memory block: size=2  -  structured block (tag=0):</CODE><BR><CODE>........memory block: size=2  -  float: 3.14</CODE><BR><CODE>........memory block: size=2  -  structured block (tag=0):</CODE><BR><CODE>............memory block: size=2  -  float: 1.2</CODE><BR><CODE>............memory block: size=2  -  structured block (tag=0):</CODE><BR><CODE>................memory block: size=2  -  float: 7.6</CODE><BR><CODE>................immediate value (0)</CODE><BR><CODE>- : float list = [3.14; 1.2; 7.6]</CODE><BR>

</PRE>

The list is viewed as a block with size 2, containing its head
and its tail. The head of the list is a float, which is also a block
of size 2.<BR>
<BR>

<H4> Closures</H4><A NAME="IC-fermeture"></A>
<A NAME="@concepts269"></A>
<A NAME="@concepts270"></A>
<A NAME="@concepts271"></A>
A function value is represented by the code to be executed when the
function is applied, and by its environment (see chapter <A HREF="index.html#chap-PF">2</A>,
page <A HREF="book-ora015.html#sec-fermeture">??</A>). There are two ways to build a function
value: either by explicit abstraction
(as in <EM>fun x -&gt; x+1</EM>) or by partial application of a curried function
(as in <EM>(fun x -&gt; fun y -&gt; x+y) 1</EM>). <BR>
<BR>
The environment of a closure can contain three kinds of variables:
those declared globally, those declared locally, and the function
parameters already instantiated by a partial application.
The implementation treats those three kinds differently.
Global variables are stored in a global environment that is not
explicitly part of any closure. Local variables and instantiated
parameters can appear in closures, as we now illustrate.<BR>
<BR>
A closure with an empty environment is simply a memory block
containing a pointer to the code of the function:


<PRE><BR># <B>let</B><CODE> </CODE>f<CODE> </CODE><CODE>=</CODE><CODE> </CODE><B>fun</B><CODE> </CODE>x<CODE> </CODE>y<CODE> </CODE>z<CODE> </CODE>-&gt;<CODE> </CODE>x<CODE>+</CODE>y<CODE>+</CODE>z<CODE> </CODE>;;<BR><CODE>val f : int -&gt; int -&gt; int -&gt; int = &lt;fun&gt;</CODE><BR># inspect<CODE> </CODE>f<CODE> </CODE>;;<BR><CODE>....memory block: size=1  -  closure with 0 free variables</CODE><BR><CODE>........code pointer: 0x807308c</CODE><BR><CODE>- : int -&gt; int -&gt; int -&gt; int = &lt;fun&gt;</CODE><BR>

</PRE>

Functions with free local variables are represented by closures with
non-empty environments. Here, the closure contains both a pointer to
the code of the function, and the values of its free local variables.


<PRE><BR># <B>let</B><CODE> </CODE>g<CODE> </CODE><CODE>=</CODE><CODE> </CODE><B>let</B><CODE> </CODE>x<CODE> </CODE><CODE>=</CODE><CODE> </CODE><CODE>1</CODE><CODE> </CODE><B>and</B><CODE> </CODE>y<CODE> </CODE><CODE>=</CODE><CODE> </CODE><CODE>2</CODE><CODE> </CODE><B>in</B><CODE> </CODE><B>fun</B><CODE> </CODE>z<CODE> </CODE>-&gt;<CODE> </CODE>x<CODE>+</CODE>y<CODE>+</CODE>z<CODE> </CODE>;;<BR><CODE>val g : int -&gt; int = &lt;fun&gt;</CODE><BR># inspect<CODE> </CODE>g<CODE> </CODE>;;<BR><CODE>....memory block: size=3  -  closure with 2 free variables</CODE><BR><CODE>........code pointer: 0x8086450</CODE><BR><CODE>........immediate value (1)</CODE><BR><CODE>........immediate value (2)</CODE><BR><CODE>- : int -&gt; int = &lt;fun&gt;</CODE><BR>

</PRE>
<BR>
<BR>
The Objective CAML virtual machine treats partial applications of functions
specially for better performance. A partial application of an
abstraction is represented by a closure containing a value for each of
the instantiated parameters, plus a pointer to the closure for the
initial abstraction.


<PRE><BR># <B>let</B><CODE> </CODE>a1<CODE> </CODE><CODE>=</CODE><CODE> </CODE>f<CODE> </CODE><CODE>1</CODE><CODE> </CODE>;;<BR><CODE>val a1 : int -&gt; int -&gt; int = &lt;fun&gt;</CODE><BR># inspect<CODE> </CODE><TT>(</TT>a1<TT>)</TT><CODE> </CODE>;;<BR><CODE>....memory block: size=3  -  closure with 2 free variables</CODE><BR><CODE>........code pointer: 0x8073088</CODE><BR><CODE>........memory block: size=1  -  closure with 0 free variables</CODE><BR><CODE>............code pointer: 0x807308c</CODE><BR><CODE>........immediate value (1)</CODE><BR><CODE>- : int -&gt; int -&gt; int = &lt;fun&gt;</CODE><BR># <B>let</B><CODE> </CODE>a2<CODE> </CODE><CODE>=</CODE><CODE> </CODE>a1<CODE> </CODE><CODE>2</CODE><CODE> </CODE>;;<BR><CODE>val a2 : int -&gt; int = &lt;fun&gt;</CODE><BR># inspect<CODE> </CODE><TT>(</TT>a2<TT>)</TT><CODE> </CODE>;;<BR><CODE>....memory block: size=4  -  closure with 3 free variables</CODE><BR><CODE>........code pointer: 0x8073088</CODE><BR><CODE>........memory block: size=1  -  closure with 0 free variables</CODE><BR><CODE>............code pointer: 0x807308c</CODE><BR><CODE>........immediate value (1)</CODE><BR><CODE>........immediate value (2)</CODE><BR><CODE>- : int -&gt; int = &lt;fun&gt;</CODE><BR>

</PRE>

Figure <A HREF="book-ora115.html#fig-com4-C">12.9</A> depicts the result of the inspection above.
<BLOCKQUOTE><DIV ALIGN=center><HR WIDTH="80%" SIZE=2></DIV>
<DIV ALIGN=center>
<IMG SRC="book-ora045.gif">
</DIV>
<BR>
<DIV ALIGN=center>Figure 12.9: Closure representation.</DIV><BR>

<A NAME="fig-com4-C"></A>
<DIV ALIGN=center><HR WIDTH="80%" SIZE=2></DIV></BLOCKQUOTE>The function <TT>f</TT> has no free variables, hence the environment
part of its closure is empty. The code pointer for a function with
several arguments points to the code that should be called when all
arguments are provided. In the case of <TT>f</TT>, this is the code
corresponding to <EM>x+y+z</EM>. Partial applications of this
function result in intermediate closures that point to a shared code
(it is the same code pointer for <TT>a1</TT> and <TT>a2</TT>). The role
of this code is to accumulate the arguments and detect when all
arguments have been provided. If so, it pushes all arguments and
calls the actual code for the function body; if not, it creates a new
closure. For instance, the application of <TT>a1</TT> to 2 fails to
provide all arguments to the function <TT>f</TT> (the last argument is
still missing), hence a closure is created containing the first two
arguments, 1 and 2. Notice that the closures resulting from partial
applications always contain, in the first environment slot, a pointer
to the original closure. The original closure will be called when all
arguments have been gathered.<BR>
<BR>
Mixing local declarations and partial applications results in the
following representation:


<PRE><BR># <B>let</B><CODE> </CODE>g<CODE> </CODE>x<CODE> </CODE><CODE>=</CODE><CODE> </CODE><B>let</B><CODE> </CODE>y<CODE>=</CODE><CODE>2</CODE><CODE> </CODE><B>in</B><CODE> </CODE><B>fun</B><CODE> </CODE>z<CODE> </CODE>-&gt;<CODE> </CODE>x<CODE>+</CODE>y<CODE>+</CODE>z<CODE> </CODE>;;<BR><CODE>val g : int -&gt; int -&gt; int = &lt;fun&gt;</CODE><BR># <B>let</B><CODE> </CODE>a1<CODE> </CODE><CODE>=</CODE><CODE> </CODE>g<CODE> </CODE><CODE>1</CODE><CODE> </CODE>;;<BR><CODE>val a1 : int -&gt; int = &lt;fun&gt;</CODE><BR># inspect<CODE> </CODE>a1<CODE> </CODE>;;<BR><CODE>....memory block: size=3  -  closure with 2 free variables</CODE><BR><CODE>........code pointer: 0x8086548</CODE><BR><CODE>........immediate value (1)</CODE><BR><CODE>........immediate value (2)</CODE><BR><CODE>- : int -&gt; int = &lt;fun&gt;</CODE><BR>

</PRE>
<BR>
<BR>

<H4> Abstract types</H4>
Values of an abstract type are represented like those of its
implementation type. Actually, type information is used only during
type-checking and compilation. During execution, the types are not
needed -- only the memory representation (tag bits on values, size and
tag fields on memory blocks) needs to be communicated to the garbage
collector.<BR>
<BR>
For instance, a value of the abstract type <I>'a Stack.t</I>
is represented as a reference to a list, since the type <I>'a Stack.t</I>
is implemented as <I>'a list ref</I>.


<PRE><BR># <B>let</B><CODE> </CODE>p<CODE> </CODE><CODE>=</CODE><CODE> </CODE>Stack.create();;<BR><CODE>val p : '_a Stack.t = &lt;abstr&gt;</CODE><BR># Stack.push<CODE> </CODE><CODE>3</CODE><CODE> </CODE>p;;<BR><CODE>- : unit = ()</CODE><BR># inspect<CODE> </CODE>p;;<BR><CODE>....memory block: size=1  -  structured block (tag=0):</CODE><BR><CODE>........memory block: size=2  -  structured block (tag=0):</CODE><BR><CODE>............immediate value (3)</CODE><BR><CODE>............immediate value (0)</CODE><BR><CODE>- : int Stack.t = &lt;abstr&gt;</CODE><BR>

</PRE>

On the other hand, some abstract types are implemented by
representations that cannot be expressed in Objective CAML. Typical
examples include arrays of weak pointers and input-output channels.
Often, values of those abstract types are represented as memory
blocks with tag <TT>Abstract_tag</TT>.


<PRE><BR># <B>let</B><CODE> </CODE>w<CODE> </CODE><CODE>=</CODE><CODE> </CODE>Weak.create<CODE> </CODE><CODE>1</CODE><CODE>0</CODE>;;<BR><CODE>val w : '_a Weak.t = &lt;abstr&gt;</CODE><BR># Weak.set<CODE> </CODE>w<CODE> </CODE><CODE>0</CODE><CODE> </CODE><TT>(</TT>Some<CODE> </CODE>p<TT>)</TT>;;<BR><CODE>- : unit = ()</CODE><BR># inspect<CODE> </CODE>w;;<BR><CODE>....memory block: size=11  -  abstract type</CODE><BR><CODE>- : int Stack.t Weak.t = &lt;abstr&gt;</CODE><BR>

</PRE>

Sometimes, a finalization function is attached to those values.
Finalization functions are C functions which are called by the garbage
collector just before the value is collected. They are very useful to
free external resources, such as an input-output buffer, just before
the memory block referring to those resources disappears. For
instance, inspection of the ``standard output'' channel reveals that
the type <I>out_channel</I> is represented by abstract memory
blocks with a finalization function:


<PRE><BR># inspect<CODE> </CODE><TT>(</TT>stdout<TT>)</TT><CODE> </CODE>;;<BR><CODE>....memory block: size=2  -  abstract finalized type</CODE><BR><CODE>- : out_channel = &lt;abstr&gt;</CODE><BR>

</PRE>
<BR>
<BR>
<HR>
<A HREF="book-ora114.html"><IMG SRC ="previous_motif.gif" ALT="Previous"></A>
<A HREF="index.html"><IMG SRC ="contents_motif.gif" ALT="Contents"></A>
<A HREF="book-ora116.html"><IMG SRC ="next_motif.gif" ALT="Next"></A>
</BODY>
</HTML>
